pacman::p_load(sf, tmap, tidyverse, lubridate, arrow ,maptools, raster, spatstat)Take Home Exercise 1
1. Setting the Scene
In recent years, the rise of ride-hailing applications is greatly present in our everyday lives. This can be attributed to the widespread adoption of smartphones, allowing access to the collection of large volumes of location-based data.
Grab is Southeast Asia’s leading ride-sharing company and in this exercise, we deep dive into Grab’s dataset.The dataset is sampled from Grab drivers’ trajectories, with the drivers’ personal information encrypted and the real start/end locations removed.
These datasets provides us with valuable insghts to spatial-temporal characteristics of human mobility. In the context of smart city and urban planning, analysing human mobility data through GIS methods can be transformative to allow urban planners and policymakers to gain a comprehensive understanding of how people interact with their surroundings.
2. Data Used
| Data Set | Description | Source |
|---|---|---|
MPSZ-2019 |
A polygon feature data providing information of URA’s Master Plan Planning Subzone boundary data | data.gov.sg |
gis_osm_roads_free_1 |
A line feature data that includes the road network data of Singapore, Malaysia and Brunei | https://download.geofabrik.de/asia/malaysia-singapore-brunei.html |
GrabPosisi |
Human mobility data set that was released by Grab, providing information of the Grab journeys in Singapore | Grab |
3. Objectives
In this study, we are to apply appropriate spatial point patterns analysis methods to discover the geographical and spatio-temporal distribution of Grab hailing services locations in Singapore.
Some specific tasks and learning objectives in this exercise are:
Using appropriate functions of sf and tidyverse to prepare geospatial data layer in sf tibble dataframes
Using the extracted data to derive traditional Kernel Density Estimation layers
Using the extracted data to derive the Network Kernel Density Estimation (NKDE)
Using appropriate tmap functions, display the kernel density layers on openstreetmap of Singapore.
Describe the spatial patterns revealed by the kernel density maps.
4. Installing and Loading the R packages
4.1 Packages Used
The R packages used in this exercise are:
- sf: this packages provides an efficient framework designed for importing, managing and processing vector-based geospatial data n R
- spatstat: a wide range of useful functions for point pattern analysis, which will be used to derive the Kernel Density Estimation (KDE) layer
- raster: reads, writes, manipulates, analyses and model of gridded spatial data (i.e. raster), which will be used to convert image output generated by spastat into raster format
- maptools: provides a set of tools for manipulating geographic data, which we will mainly use to convert Spatial objects into ppp format
- tmap: provides functions for plotting cartographic quality static point patterns maps or interactive maps by using Leaflet API
- arrow: this package exposes the interface to the Arrow C++, enabling access to many features in R, which we will be using to read Parquet files into R environment
- lubridate: a popular R package that simplifies the handling of date-time objects in R and is a member of the tidyverse family
- tidyverse: a collection of R packages designed to facilitate and streamline the process of data analysis and manipulation in R
Load all these packages into your R environment:
5. Geospatial Data
5.1 Importing Geospatial Data
In this section, we will use st_read() to read our MPSZ 2019 layer as a simple features data frame. We can convert it to other objects later on if necessary!
mpsz_sf <- st_read(dsn = "data/MPSZ-2019",
layer = "MPSZ-2019")Reading layer `MPSZ-2019' from data source
`C:\jaymieseet\IS415-GAA-JS\Take-home_Ex\Take-home_Ex01\data\MPSZ-2019'
using driver `ESRI Shapefile'
Simple feature collection with 332 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 103.6057 ymin: 1.158699 xmax: 104.0885 ymax: 1.470775
Geodetic CRS: WGS 84
sg_roads <- st_read(dsn = "data/OSM",
layer = "gis_osm_roads_free_1")Reading layer `gis_osm_roads_free_1' from data source
`C:\jaymieseet\IS415-GAA-JS\Take-home_Ex\Take-home_Ex01\data\OSM'
using driver `ESRI Shapefile'
Simple feature collection with 1759836 features and 10 fields
Geometry type: LINESTRING
Dimension: XY
Bounding box: xmin: 99.66041 ymin: 0.8021131 xmax: 119.2601 ymax: 7.514393
Geodetic CRS: WGS 84
5.2 Geospatial Data Pre-processing
5.2.1 Changing the Referencing System to Singapore national Projected coordinate system
When we inspect the Projected CRS, the MPSZ 2019 layer that we just imported is in WGS84. However, since our analysis is focused on Singapore, we should change it to the Singapore Projected Coordinate System, SVY21 (EPSG Code 3414), using st_transform().
mpsz3414 <- st_transform(mpsz_sf, crs = 3414)
st_crs(mpsz3414)Coordinate Reference System:
User input: EPSG:3414
wkt:
PROJCRS["SVY21 / Singapore TM",
BASEGEOGCRS["SVY21",
DATUM["SVY21",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4757]],
CONVERSION["Singapore Transverse Mercator",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["northing (N)",north,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["easting (E)",east,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["Cadastre, engineering survey, topographic mapping."],
AREA["Singapore - onshore and offshore."],
BBOX[1.13,103.59,1.47,104.07]],
ID["EPSG",3414]]
sg_roads3414 <- st_transform(sg_roads, crs = 3414)
st_crs(sg_roads3414)Coordinate Reference System:
User input: EPSG:3414
wkt:
PROJCRS["SVY21 / Singapore TM",
BASEGEOGCRS["SVY21",
DATUM["SVY21",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4757]],
CONVERSION["Singapore Transverse Mercator",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["northing (N)",north,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["easting (E)",east,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["Cadastre, engineering survey, topographic mapping."],
AREA["Singapore - onshore and offshore."],
BBOX[1.13,103.59,1.47,104.07]],
ID["EPSG",3414]]
Now we can see that the Project CRS is changed to SVY21!
5.2.2 Initial Visualisation
Since the map of the MPSZ would be separated into many subzones, in order for a cleaner visualisation, we can use st_union() to combine all the polygons, to give us the outline of Singapore.
sg_sf <- mpsz3414 %>%
st_union()
plot(sg_sf)
From this visualisation, we can see that there are the presence of the small surrounding islands surrounding the main Singapore island. For this exercise, we will exclude the surrounding islands from analysis as our focus is specifically on understanding human mobility within Singapore. It is reasonable to assume that Grab services do not extend to these islands.
5.2.3 Filtering Out Surrounding Islands
In order to filter out these islands from our data set, we can inspect the attribute table of the mpsz3414 data set that has been loaded into our R environment.
When we look at the SUBZONE_N or PLN_AREA_N fields, we can see that the names include the word ‘ISLAND’, which suggests that these might be the rows of data we would like to exclude.



However, to further confirm this, I loaded this ESRI shapefile into QGIS in order to accurately identify the names of the islands. As you can see from the screenshot below, it is indeed the case.

Hence, now all we need to do is to exclude these rows from our mpsz3414 data set with the code chunk below. Then, we can plot out the map again to find out whether this is successful!
sg_sf <- mpsz3414[!(mpsz3414$PLN_AREA_N %in% c('NORTH-EASTERN ISLANDS', 'SOUTHERN ISLANDS', 'WESTERN ISLANDS')), ]
plot(st_geometry(sg_sf))
Now, we can see that the islands have been excluded!
When we use plot(sf_object) we are plotting the entire sf object, including all its attributes and geometries. Hence, if we call plot(mpsz3414_filter), we will have 6 different visualisations, one for each attribute. Since we only want the outline of the map and not focusing on the individual attributes, we can use plot(st_geometry(mpsz3414_filter)) to only plot out the geometries within the sf.
However, after performing st_union(), it creates a new geometry object that represents the combined geometries of the sf object and it only operates on the geometries, leaving the attributes intact. Thus, using plot(sg_sf) after st_union() will only give us one visualisation, displaying the unioned geometries and there is no need to use st_geometry().
6. Aspatial Data Set
6.1 Importing Grab-Posisi Data Set
In the GrabPosisi Dataset that we are using, there are a total of 100 parts. Let’s first import all parts using read_parquet() of arrow package that allows us to read parquet format into R. By default, the output file will be in tibble data frame.
grab00 <- read_parquet("data/GrabPosisi/part-00000-8bbff892-97d2-4011-9961-703e38972569.c000.snappy.parquet")
grab01 <- read_parquet("data/GrabPosisi/part-00001-8bbff892-97d2-4011-9961-703e38972569.c000.snappy.parquet")
grab02 <- read_parquet("data/GrabPosisi/part-00002-8bbff892-97d2-4011-9961-703e38972569.c000.snappy.parquet")
grab03 <- read_parquet("data/GrabPosisi/part-00003-8bbff892-97d2-4011-9961-703e38972569.c000.snappy.parquet")
grab04 <- read_parquet("data/GrabPosisi/part-00004-8bbff892-97d2-4011-9961-703e38972569.c000.snappy.parquet")
grab05 <- read_parquet("data/GrabPosisi/part-00005-8bbff892-97d2-4011-9961-703e38972569.c000.snappy.parquet")
grab06 <- read_parquet("data/GrabPosisi/part-00006-8bbff892-97d2-4011-9961-703e38972569.c000.snappy.parquet")
grab07 <- read_parquet("data/GrabPosisi/part-00007-8bbff892-97d2-4011-9961-703e38972569.c000.snappy.parquet")
grab08 <- read_parquet("data/GrabPosisi/part-00008-8bbff892-97d2-4011-9961-703e38972569.c000.snappy.parquet")
grab09 <- read_parquet("data/GrabPosisi/part-00009-8bbff892-97d2-4011-9961-703e38972569.c000.snappy.parquet")6.2 Data Preparation
For us to efficiently work with this data set, we have to combine all the parts into one data frame using rbind(), which is a base R package used for combining objects by row-wise binding. Then, we can review the data type using glimpse() from dplyr package to display the structure of the tibble data frame.
grab_df <- rbind(grab00, grab01, grab02, grab03, grab04, grab05, grab06, grab07, grab08, grab09)
glimpse(grab_df)Upon inspection of grab_df, pingtimestamp is in interger format, which is the wrong data type format. We have to change it to a date/time format.
grab_df$pingtimestamp <- as_datetime(grab_df$pingtimestamp)
glimpse(grab_df)Now we can see that the data type is POSIXct, which is used to represent date and time values, including both the date and the time of day.
6.2.1 Extracting trip origins
The data set includes all the details of a trip, including all the intermediate check points taken. For this analysis, our focus will be solely on the origin and destination locations, as we want to learn where are the most frequented areas for ride-hailing services.
In this code chunk, there are several steps to take note:
group_by() of dplyr: groups the records according to the values of trj_id
arrange() of dplyr: to sort the rows of a data frame by pingtimestamp field, which are sorted in ascending by default
filter() of dplyr: to retain all rows that meet the selection criteria, i.e. row_number()==1
We specify this rule because since the data is already sorted by time, we want to first row, which will be the earliest time stamp of a recorded trip to give us the origin location.
mutate() of dplyr: to derive new fields by using functions
wday() of lubridate: to return the day of the week, which is the full character of the day (e.g. Monday) by default
hour() of lubridate: returns the hour of the day
mday() of lubridate: returns the day of the month
origin_df <- grab_df %>%
group_by(trj_id) %>%
arrange(pingtimestamp) %>%
filter(row_number()==1) %>%
mutate(weekday = wday(pingtimestamp,
label=TRUE,
abbr=TRUE),
start_hr = factor(hour(pingtimestamp)),
day = factor(mday(pingtimestamp)))6.2.2 Extracting trip destinations
The steps to extract the trip destinations are the same, except that arrange(desc(pingtimestamp)) is used to sort the values in pingtimestamp field in a descending order instead. This will allow us to filter the rows that have the latest time stamp of a trip, to give us the destination locations.
destination_df <- grab_df %>%
group_by(trj_id) %>%
arrange(desc(pingtimestamp)) %>%
filter(row_number()==1) %>%
mutate(weekday = wday(pingtimestamp,
label=TRUE,
abbr=TRUE),
end_hr = factor(hour(pingtimestamp)),
day = factor(mday(pingtimestamp)))write_rds(origin_df, "data/rds/origin_df.rds")
write_rds(destination_df, "data/rds/destination_df.rds")origin_df <- read_rds("data/rds/origin_df.rds")
destination_df <- read_rds("data/rds/destination_df.rds")If you want to save the tidied data for future use, you can use the code chunk below:
write_rds(origin_df, "data/rds/origin_df.rds")
write_rds(destination_df, "data/rds/destination_df.rds")
When you want to import the data, you can use the code chunk below:
origin_df <- read_rds("data/rds/origin_df.rds")
destination_df <- read_rds("data/rds/destination_df.rds")
6.3 Converting tibble dataframe into sf tibble dataframe
Tibble dataframes are not designed for spatial data and do not have support for spatial operations or geometries. The sf package has the capabilities of tibbles to handle spatial data as it includes spatial geometries, such as points, lines or polygons. Thus, to perform a wide range of spatial operations, we should convert our tibble data frame into a sf tibble data frame.
- st_as_sf() of sf: to perform conversion
coords argument: specifies the names of the columns in the input data frame that represent longitude and latitude coordinates, it is crucial that the longitude column comes first, followed by the latitude column
crs argument: allows you to specify the source’s coordinate reference system
- st_transform() of sf: transform the CRS to the appropriate one, EPSG Code 3414
origin_sf <- st_as_sf(origin_df,
coords = c("rawlng", "rawlat"),
crs = 4326) %>%
st_transform(crs = 3414)
destination_sf <- st_as_sf(destination_df,
coords = c("rawlng", "rawlat"),
crs = 4326) %>%
st_transform(crs = 3414)6.4 Visualising the Data
6.4.1 Visualising Frequency Distribution
The code uses ggplot2 to create the visualizations and generate a bar plot of the weekday variable in the origin_df data frame.
ggplot(data = origin_df, aes(x = weekday)): sets up the ggplot object with ’origin_df as the data source and maps weekday variable to x-axis
geom_bar(): adds a bar layer to the plot, creating a bar chart where the height of each bar represents the count of origin trips for each weekday.
geom_text(): Adds text labels to the bars, displaying the count of observations above each bar.
aes(label = after_stat(count)): label each bar to display the count of observations
vjust = -0.1: adjust the vertical position of text label
ggplot(data=origin_df,
aes(x=weekday)) +
geom_bar() +
geom_text(
aes(label = after_stat(count)),
stat = "count",
vjust = -0.1
)
ggplot(data=destination_df,
aes(x=weekday)) +
geom_bar() +
geom_text(
aes(label = after_stat(count)),
stat = "count",
vjust = -0.1
)
6.4.2 Visualising as Point Symbol Map
Using tmap to create a map and plotting the dots to represent the spatial points.
tmap_mode("plot")
tm_shape(origin_sf) +
tm_dots()
tmap_mode("plot")
tm_shape(destination_sf) +
tm_dots()
7. Geospatial Data Wrangling
7.1 Converting sf dataframe to sp Spatial* classes
While simple feature data frames are becoming more popular, many geospatial analysis packages still rely on sp’s Spatial* classes for input data.In this section, you will learn how to convert simple feature data frame to sp’s Spatial* class.
The code chunk below uses as_Spatial() of sf package to convert the three geospatial data from simple feature data frame to sp’s Spatial* class.
origins <- as_Spatial(origin_sf)
destination <- as_Spatial(destination_sf)
mpsz <- as_Spatial(sg_sf)mpszclass : SpatialPolygonsDataFrame
features : 326
extent : 2667.538, 55941.94, 21448.47, 50256.33 (xmin, xmax, ymin, ymax)
crs : +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
variables : 6
names : SUBZONE_N, SUBZONE_C, PLN_AREA_N, PLN_AREA_C, REGION_N, REGION_C
min values : ADMIRALTY, AMSZ01, ANG MO KIO, AM, CENTRAL REGION, CR
max values : YUNNAN, YSSZ09, YISHUN, YS, WEST REGION, WR
originsclass : SpatialPointsDataFrame
features : 28000
extent : 3628.243, 49845.23, 25198.14, 49689.64 (xmin, xmax, ymin, ymax)
crs : +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
variables : 10
names : trj_id, driving_mode, osname, pingtimestamp, speed, bearing, accuracy, weekday, start_hr, day
min values : 10, car, android, 1554682166, -1, 0, 1, Fri, 0, 10
max values : 9984, car, ios, 1555889608, 30.9490566253662, 359, 728, Wed, 9, 9
destinationclass : SpatialPointsDataFrame
features : 28000
extent : 3637.207, 49870.63, 25221.3, 49507.79 (xmin, xmax, ymin, ymax)
crs : +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
variables : 10
names : trj_id, driving_mode, osname, pingtimestamp, speed, bearing, accuracy, weekday, end_hr, day
min values : 10, car, android, 1554683192, -1, 0, 1.5, Fri, 0, 10
max values : 9984, car, ios, 1555891009, 36.4799995422363, 359, 1414, Wed, 9, 9
When inspecting the data, we can now see that the geospatial data have been converted into their respective sp’s Spatial* classes now.
7.2 Converting the Spatial* class into generic sp format
Since we want to use spatstat package, it requires the analytical data in ppp object form. There is no direct way to convert a Spatial* classes into ppp object, thus we need to convert the Spatial classes* into Spatial object first.
origin_sp <- as(origins, "SpatialPoints")
destination_sp <- as(destination, "SpatialPoints")
sg_sp <- as(mpsz, "SpatialPolygons")origin_spclass : SpatialPoints
features : 28000
extent : 3628.243, 49845.23, 25198.14, 49689.64 (xmin, xmax, ymin, ymax)
crs : +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
destination_spclass : SpatialPoints
features : 28000
extent : 3637.207, 49870.63, 25221.3, 49507.79 (xmin, xmax, ymin, ymax)
crs : +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
sg_spclass : SpatialPolygons
features : 326
extent : 2667.538, 55941.94, 21448.47, 50256.33 (xmin, xmax, ymin, ymax)
crs : +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
7.3 Converting the generic sp format into spatstat’s ppp format
Now we can use as.ppp() of spatstat to convert the spatial data into a ppp object format and summary() to inspect of the details of the created ppp object.
origin_ppp <- as(origin_sp, "ppp")
plot(origin_ppp)
summary(origin_ppp)Planar point pattern: 28000 points
Average intensity 2.473666e-05 points per square unit
Coordinates are given to 3 decimal places
i.e. rounded to the nearest multiple of 0.001 units
Window: rectangle = [3628.24, 49845.23] x [25198.14, 49689.64] units
(46220 x 24490 units)
Window area = 1131920000 square units
destination_ppp <- as(destination_sp, "ppp")
plot(destination_ppp)
summary(destination_ppp)Planar point pattern: 28000 points
Average intensity 2.493661e-05 points per square unit
Coordinates are given to 3 decimal places
i.e. rounded to the nearest multiple of 0.001 units
Window: rectangle = [3637.21, 49870.63] x [25221.3, 49507.79] units
(46230 x 24290 units)
Window area = 1122850000 square units
7.3.1 Checking for Duplicates
Although there was no warning message, it is still good practice to check the data sets for any duplicates in the ppp objects.
any(duplicated(origin_ppp))[1] FALSE
any(duplicated(destination_ppp))[1] FALSE
Since both returned FALSE, we can confirm there are no duplicate points we have to deal with.
If there are duplicate points to deal with, one solution is to use jittering, which will add a small perturbation to the duplicate points so that they do not occupy the same space
childcare_ppp_jit <- rjitter(childcare_ppp, retry=TRUE, nsim=1, drop=TRUE)
However, the easiest and most accurate way to handle the duplicate points would be to delete it, but might also lead to point events being lost.
7.4 Creating Owin Object
When conducting spatial point pattern analysis, it is advisable to confine the analysis within a specific geographical area, such as the boundary of Singapore. In spatstat, a specialized object called owin is used to represent polygonal regions like Singapore’s boundary. This allows us to define the spatial extent of our analysis to the desired area and ensure the results are relevant. Thus, the code chunk below shows you how to convert the SG SpatialPolygon object into owin object of spatstat.
sg_owin <- as(sg_sp, "owin")
plot(sg_owin)
summary(sg_owin)7.5 Combining PPP objects and Owin object
In the last step of data wrangling, we will extract the origin and destination trips that are located within Singapore.
originSG_ppp = origin_ppp[sg_owin]
plot(originSG_ppp)
destSG_ppp = destination_ppp[sg_owin]
plot(destSG_ppp)
8. Kernel Density Estimation
After a long process of data wrangling, we can finally derive the Kernel Density Estimation layer. Kernel density estimation entails using a function, referred to as a “kernel,” on each data point. This process involves averaging the position of each point relative to the positions of the other data points. This analysis helps us to visualise the intensity of point processes, to identify spatial trends and areas of high or low concentration.
8.1 Computing KDE using Automatic Bandwidth Selection method
Using the Automatic bandwidth selection method means to automatically determine the optimal bandwidth parameter to perform KDE. These methods typically use statistical techniques to estimate the bandwidth based on the characteristics of the data.
The bandwidth parameter controls the smoothing of the KDE surface and can greatly influence how the data points contribute to the KDE.
- A small bandwidth can leads to undersmoothing, more localized estimation with higher variability
- A large bandwidth can lead to oversmoothing, smoother estimate with lower variability
The code chunk below computes a kernel density by using configurations of density() of spatstat:
The method bw.diggle() automatically selects the bandwidth for kernel density estimation. Alternative methods include bw.CvL(), bw.scott(), or bw.ppl()
The smoothing kernel is Gaussian, but other options such as “epanechnikov”, “quartic”, or “disc” are available
Edge effect bias correction, based on Jones (1993) and Diggle (2010, equation 18.9), is optional and defaults to FALSE
kde_originSG_bw <- density(originSG_ppp,
sigma=bw.diggle,
edge=TRUE,
kernel="gaussian")
plot(kde_originSG_bw)
kde_destSG_bw <- density(destSG_ppp,
sigma=bw.diggle,
edge=TRUE,
kernel="gaussian")
plot(kde_destSG_bw)
You can retrieve the bandwidth used to compute the KDE layer using this code chunk:
bw <- bw.diggle(originSG_ppp)
8.1.1 Rescaling KDE Values
From their legneds, we can see that the density values range from 0 to 0.001, which are difficult to interprt due to their small magnitude.This is because the default unit of measurement for SVY21 is meters. Consequently, the density values are calculated as “number of points per square meter”. Hence, we should do some rescaling to make the data more comprehensible.
originSG_ppp.km <- rescale(originSG_ppp, 1000, "km")
kde_originSG.bw <- density(originSG_ppp.km, sigma=bw.diggle, edge=TRUE, kernel="gaussian")
plot(kde_originSG.bw)
destSG_ppp.km <- rescale(destSG_ppp, 1000, "km")
kde_destSG.bw <- density(destSG_ppp.km, sigma=bw.diggle, edge=TRUE, kernel="gaussian")
plot(kde_destSG.bw)
8.2 Adaptive Bandwidth
The fixed bandwidth method in kernel density estimation is highly sensitive to the skewed distribution of spatial point patterns across geographical units, such as urban versus rural areas. One way to address this issue is by employing an adaptive bandwidth instead.Adaptive bandwidths adjust according to the local density of points, allowing for more flexibilty in capturing spatial patterns across varying densities.
Here, we use density.adaptive() function of spatstat.
kde_originSG_adaptive <- adaptive.density(originSG_ppp.km, method="kernel")
plot(kde_originSG_adaptive)
kde_destSG_adaptive <- adaptive.density(destSG_ppp.km, method="kernel")
plot(kde_destSG_adaptive)
8.3 Fixed Bandwidth
Upon observation of the Kernel Density Estimation (KDE) layers we’ve generated, it’s apparent that areas of high densities aren’t easily discernible. Therefore, we may opt to customize our bandwidth to enhance the clarity and precision of our analysis.
However, keep in mind that fixed bandwidths may not adapt well to spatial patterns that vary across different areas. Using the same bandwidth for one data set may result in oversmoothing or undersmoothing, but it can help us identify the general trend. Here I have chosen the bandwidth to be 800m and since we have already rescaled itto kilometers, we will write is as sigma = 0.8.
kde_originSG_800 <- density(originSG_ppp.km, sigma=0.8, edge=TRUE, kernel="gaussian")
plot(kde_originSG_800)
kde_destSG_800 <- density(destSG_ppp.km, sigma=0.8, edge=TRUE, kernel="gaussian")
plot(kde_destSG_800)
9. Converting KDE output into grid object
The results given will be the same but we convert it to a grid object that is suitable for mapping purposes.
gridded_kde_originSG_bw <- as.SpatialGridDataFrame.im(kde_originSG_800)
spplot(gridded_kde_originSG_bw)
gridded_kde_destSG_bw <- as.SpatialGridDataFrame.im(kde_destSG_800)
spplot(gridded_kde_destSG_bw)
9.1 Converting Gridded Output into Raster
We will then convert the gridded kernal density objects into RasterLayer object by using raster() of raster package.
kde_originSG_bw_raster <- raster(gridded_kde_originSG_bw)
kde_destSG_bw_raster <- raster(gridded_kde_destSG_bw)Take a look at the properties of the raster layer
kde_originSG_bw_rasterclass : RasterLayer
dimensions : 128, 128, 16384 (nrow, ncol, ncell)
resolution : 0.4162063, 0.2250614 (x, y)
extent : 2.667538, 55.94194, 21.44847, 50.25633 (xmin, xmax, ymin, ymax)
crs : NA
source : memory
names : v
values : -1.523218e-14, 310.3063 (min, max)
kde_destSG_bw_rasterclass : RasterLayer
dimensions : 128, 128, 16384 (nrow, ncol, ncell)
resolution : 0.4162063, 0.2250614 (x, y)
extent : 2.667538, 55.94194, 21.44847, 50.25633 (xmin, xmax, ymin, ymax)
crs : NA
source : memory
names : v
values : -1.619782e-14, 299.8866 (min, max)
9.2 Assign Projection Systems
From the properties observed, we can see that the CRS property is not available, so let’s assign to proper CRS by using this code below:
projection(kde_originSG_bw_raster) <- CRS("+init=EPSG:3414 +datum=WGS84 +units=km")
projection(kde_destSG_bw_raster) <- CRS("+init=EPSG:3414 +datum=WGS84 +units=km")9.3 Making a density map function
Now, we are going to generate the KDE raster layer over and OpenStreetMap basemap.
- tm_basemap(“OpenStreetMap”): specifies the basemap to be used
- tm_raster(“v”, alpha=0.65): adds the raster object to the map, argument “v” indicates that the raster object is t be visualised and the ‘alpha’ paramete set the transparency of the raster layer to 0.65.
- the rest of the parameters are to customize the layout of the map
density_map <- function(raster_object, map_title) {
tm_basemap("OpenStreetMap") +
tm_shape(raster_object) +
tm_raster("v", alpha=0.65) +
tm_layout(legend.position = c("right", "bottom"),
legend.height = 0.5,
legend.width = 0.4,
main.title = map_title,
main.title.position = 'center',
main.title.size = 1,
frame = FALSE)
} 9.3.1 Plotting KDE on Open Streetmap
We call the density_map() function that we created earlier and input our raster layer.
origin_osm <- density_map(kde_originSG_bw_raster, map_title = 'Grab-hailing Origin Density Map')
destination_osm <- density_map(kde_destSG_bw_raster, map_title = 'Grab-hailing Destination Density Map')Let’s see the visualisation!
tmap_arrange(origin_osm, destination_osm)
10. Kernel Density Maps Analysis
With the OpenStreetMap Density Map, we can more explicitly find out which areas are clearly have a higher concentration of both pick-ups and drop-offs. We can see that there are clear areas that are highlighted in these maps.By comparing with the Singapore planning subzone map, we can identify the planning areas that have the highest density are in the Central Area.
The planning areas that are identified in this central area is:
- Downtown Core
- Outram
- Museum
The main Central Business District in Singapore is Downtown Core, based on the Urban Redevelopment Website, it is identified as the “economic and cultural heart of Singapore”. This could largely explain why most of the pick-ups and drop-offs happen here, since it has the highest consistent human traffic on a daily basis. Most Singaporeans in Singapore have a 5-day work week, which makes travelling to the CBD very frequent. Despite the presence of well-connected public transport options, there is still high usage of Grab-hailing services in the CBD area. This could be attributed to several reasons such as time efficiency, comfort and the convenience of being able to request rides directly from their smart phones and to be picked up and dropped off at specific locations.
Some other areas that are identified to be are spread across the North, East and South areas. By comparing with the Singstat Map of Planning Area/Subzones in Singapore, the planning areas we can roughly identify would be Jurong West, Tampines and Woodlands.These three areas are primarily residential neighbourhoods with large housing estates and residential complexes. The concentration of residential areas often lead to higher demand for transportation service.
Referring to Singstat’s Geographic Distribution of the Singapore Resident Population Report, it’s notable that Jurong West, Tampines, Bedok, Woodlands, and Hougang emerge as the top five residential planning areas with over 200,000 residents each. Given our understanding that the majority of origin and destination trips cluster around the CBD area, it’s logical to infer that densely populated residential zones also exhibit a relatively high density of ride-hailing origin and destination trips. It’s likely that many residents rely on ride-hailing services for their daily commute to and from work, contributing to the elevated demand for such services in these areas.
10.1 Comparing Spatial Point patterns using KDE
Most of the steps performed here have already been explained previously, but now we are extracting the specific planning areas that we would like to analyse!
10.1.1 Extracting Study Area
tm = mpsz[mpsz@data$PLN_AREA_N == "TAMPINES",]
jw = mpsz[mpsz@data$PLN_AREA_N == "JURONG WEST",]
dc = mpsz[mpsz@data$PLN_AREA_N == "DOWNTOWN CORE",]
wl = mpsz[mpsz@data$PLN_AREA_N == "WOODLANDS",]Plotting the target areas:
par(mfrow=c(2,2))
plot(tm, main = "Tampines")
plot(jw, main = "Jurong West")
plot(dc, main = "Downtown Core")
plot(wl, main = "Woodlands")
10.1.2 Converting the spatial point dataframe into generic sp format
wl_sp = as(wl, "SpatialPolygons")
tm_sp = as(tm, "SpatialPolygons")
dc_sp = as(dc, "SpatialPolygons")
jw_sp = as(jw, "SpatialPolygons")10.1.3 Creating Owin object
wl_owin = as(wl_sp, "owin")
tm_owin = as(tm_sp, "owin")
dc_owin = as(dc_sp, "owin")
jw_owin = as(jw_sp, "owin")10.1.4 Combining Origin Points and Study Area
or_wl_ppp = originSG_ppp[wl_owin]
or_tm_ppp = originSG_ppp[tm_owin]
or_dc_ppp = originSG_ppp[dc_owin]
or_jw_ppp = originSG_ppp[jw_owin]10.1.5 Rescale
or_wl_ppp.km = rescale(or_wl_ppp, 1000, "km")
or_tm_ppp.km = rescale(or_tm_ppp, 1000, "km")
or_dc_ppp.km = rescale(or_dc_ppp, 1000, "km")
or_jw_ppp.km = rescale(or_jw_ppp, 1000, "km")Plotting the rescaled planning areas:
par(mfrow=c(2,2))
plot(or_tm_ppp.km, main = "Tampines")
plot(or_jw_ppp.km, main = "Jurong West")
plot(or_dc_ppp.km, main = "Downtown Core")
plot(or_wl_ppp.km, main = "Woodlands")
10.1.6 Computing Fixed Bandwidth KDE
par(mfrow=c(2,2))
plot(density(or_tm_ppp.km,
sigma=0.25,
edge=TRUE,
kernel="gaussian"),
main="Tampines")
plot(density(or_jw_ppp.km,
sigma=0.25,
edge=TRUE,
kernel="gaussian"),
main="Jurong West")
plot(density(or_dc_ppp.km,
sigma=0.25,
edge=TRUE,
kernel="gaussian"),
main="Downtown Core")
plot(density(or_wl_ppp.km,
sigma=0.25,
edge=TRUE,
kernel="gaussian"),
main="Woodlands")
From the visualisations above, we can observe that the 2 planning areas with the higher KDE values are Downtown Core and Jurong West
10.2 Nearest Neighbour Analysis
In this section, we’ll conduct the Clark-Evans test of aggregation to assess the spatial point pattern of childcare services using the clarkevans.test() function from the statspat package.
The test hypotheses are as follows:
Ho: The distribution of origin locations is random. H1: The distribution of origin locations is not random.
We will use a 95% confidence interval to evaluate the results.
This test helps us determine whether origin locations are clustered or dispersed across the study area. If the observed distribution significantly deviates from randomness, it suggests spatial aggregation or dispersion of origin locations.
10.2.1 Testing Spatial Point Patterns using Clark and Evans Test
# Whole Singapore
clarkevans.test(originSG_ppp,
correction="none",
clipregion="sg_owin",
alternative=c("clustered"),
nsim=99)
Clark-Evans test
No edge correction
Z-test
data: originSG_ppp
R = 0.27981, p-value < 2.2e-16
alternative hypothesis: clustered (R < 1)
The p-value is less than 0.05, indicating that there is sufficient evidence to reject our null hypothesis that the distribution is random.
Based on the Clark-Evans test results, there is strong evidence to suggest that the origin points of Grab rides are clustered rather than randomly distributed.
# Downtown Core
clarkevans.test(or_dc_ppp,
correction="none",
clipregion="sg_owin",
alternative=c("clustered"),
nsim=99)
Clark-Evans test
No edge correction
Z-test
data: or_dc_ppp
R = 0.47245, p-value < 2.2e-16
alternative hypothesis: clustered (R < 1)
# Jurong West
clarkevans.test(or_jw_ppp,
correction="none",
clipregion="sg_owin",
alternative=c("clustered"),
nsim=99)
Clark-Evans test
No edge correction
Z-test
data: or_jw_ppp
R = 0.33423, p-value < 2.2e-16
alternative hypothesis: clustered (R < 1)
The same conclusion can be drawn from the Clark and Evans test on Downtown Core and Jurong West Planning Areas.
11. Network Constrained Spatial Point Patterns Analysis
Network constrained Spatial Point Patterns Analysis (NetSPAA) is a collection of spatial point patterns analysis methods special developed for analysing spatial point event occurs on or alongside network. The spatial point event can be locations of traffic accident or crimes etc. The network, on the other hand can be a road network or river network. This allows us to identify areas of high event occurrence or concentration along the network.
11.1 Installing and Loading Packages into R
There are a few other packages that we need to load into R:
spNetwork: to derive the netowrk constrained kernel density estimation (NetKDE) and also used to build spatial matrices to conduct traditional spatial analysis with spatial weights based on reticular distances
sp: provides classes and methods for dealing with spatial data in R, used to manage SpatialLinesDataframe and SpatialPointsDataFrame and for performing projection transformation
pacman::p_load(sp, sf, spNetwork, tmap)11.2 Data Preparation
Since the road network data is very large and detailed, compuing the NetKDE on the whole network would take a lot of computational power. Hence, we are picking out only a subset of the whole data set to conduct this analysis. Based on our earlier analysis, it’s evident that the Central Business District (CBD) exhibits the highest kernel density estimation (KDE) values. Hence, we’ll narrow our focus to the Downtown Core planning area for this analysis, leveraging the wealth of insights available within this densely populated and highly active region.
11.2.1 Getting the road intersection for Downtown Core
First, we will use st_as_sf() to convert the Downtown Core Planning area into a sf object in order to perform the intersection.
dc_sf<- st_as_sf(dc)Now, we can use st_intersection() of sf to perform a spatial intersection operation between the roads and Downtown Core planning area.
roads_dc will return us only the road segments that fall within the boundaries of the Downtown Core area.
roads_dc <- st_intersection(sg_roads3414, dc_sf)
plot(st_geometry(roads_dc))
origin_dc <- st_intersection(origin_sf,dc_sf)
plot(st_geometry(origin_dc))
To visualise the geospatial data with high cartographic quality and interactive manner, the mapping function of tmap package can be used as shown in the code chunk below. Change the colour to red to be able to visualise the dots better.
tmap_mode('view')
tm_shape(origin_dc) +
tm_dots(col='red') +
tm_shape(roads_dc) +
tm_lines()11.2.3 Preparing the lixels objects
The generation of lixel lines would typically require the input geometries to be LINESTRING. Thus, in this code chunk below, we extract the geometry types of each feature in the roads_dc data set with st_geometry_type(). We then check the presence of any non-linestring geometry type in the data set.
geometry_types <- st_geometry_type(roads_dc)
all_linestrings <- all(geometry_types == "LINESTRING")
all_linestrings[1] FALSE
Since there are non-linestring geometry types, this code chunk will help us remove these features.
non_linestring_indices <- which(geometry_types != "LINESTRING")
# Remove features with non-linestring geometry types
roads_dc <- roads_dc[-non_linestring_indices, ]We can now prepare the lixel objects. Before computing NetKDE, the SpatialLines object has to be cut into lixels to split the road network into smaller, manageable units. We will use lixelize_lines(), with a minimum distance of 350km to ensure that the lixels are not too short.
lixels <- lixelize_lines(roads_dc,
700,
mindist = 350)We will use lines_center() of spNetwork to calculate the center points for each line segment.
samples <- lines_center(lixels)11.4 Performing NetKDE
Now, we can perform the NetKDE!
In this code, there are quite a few steps to explain:
nkde() of spNetwork
events parameter: the point events for which the KDE will be performed, which is origin_dc
w parameter: specifies the weights associated with each event point
quartric kernel function used, there are other possible kernel methods supprted by spNetwork (triangle, gaussian, scaled gaussian, tricube, cosine, triweight, epanechnikov or uniform)
method argument: simple method is used to calculate the NKDE, the distances between events and sampling points are replaced by network distances, and the formula of the kernel is adapted to calculate the density over a linear unit instead of an areal unit.
Other methods:
“discontinuous”, which equally divides the mass density of an event at intersections of lixel
“continuous”, which divides the mss of the density at the intersection but adjusts the density before the intersection to make the function continuous
densities <- nkde(roads_dc,
events = origin_dc,
w = rep(1,nrow(origin_dc)),
samples = samples,
kernel_name = "quartic",
bw = 300,
div= "bw",
method = "simple",
digits = 1,
tol = 1,
grid_shape = c(1,1),
max_depth = 8,
agg = 5, #we aggregate events within a 5m radius (faster calculation)
sparse = TRUE,
verbose = FALSE)11.5 Visualising NetKDE
The code chunk below will insert the computed density values into samples and lixels objects as density field.
Since SVY21 is in metres, the computed density values are very small, thus we have to rescale to number of events per kilometre once again.
samples$density <- densities
lixels$density <- densities
# rescaling to help the mapping
samples$density <- samples$density*1000
lixels$density <- lixels$density*1000The following code utilizes suitable functions from the tmap package to create interactive maps with high-quality cartographic visualization.
tmap_mode('view')
tm_basemap(server = "OpenStreetMap", alpha=0.5)+
tm_shape(origin_dc)+
tm_dots()+
tm_shape(lixels)+
tm_lines(col="density")The NetKDE layer visualizes the density of pick-up locations within the road network, highlighting areas with a higher frequency of pick-ups. By observing this map, Grab users and drivers can identify where pick-ups are more likely to occur, helping them plan routes or anticipate demand in certain areas.
According to the OpenStreetMap, these areas are:
- Near Raffles Place
- Near Bugis/Guoco Midtown
- Ophir Road
- Near Bayfront
Overlaying the NetKDE layer on the OpenStreetMap provides an efficient way to pinpoint high-demand areas within the Downtown Core. Users can easily visualize and interpret density hotspots, and can facilitate their choice of pick-up locations uring peak-hour, to avoid high traffic areas. Depending on the Grab pricing algorithm, choosing a less crowded pick up location might also lead to lower fares as there is less demand in that area.
12. Conclusion
In conclusion, the density levels of commuter origins and destinations are influenced by a multitude of factors. Urban mobility dynamics are complex, shaped by various socioeconomic, geographical, and infrastructural factors. From the analysis conducted, it becomes evident that the spatial distribution of commuter origins and destinations is not random but rather influenced by factors such as population density, employment centers, residential areas, transportation infrastructure, and commercial hubs. The density maps provide great insights to the patterns of commuters, which can aid in decision making.